32 research outputs found

    The Cerevoice Blizzard Entry 2007: Are Small Database Errors Worse than Compression Artifacts?

    Get PDF
    In commercial systems the memory footprint of unit selection systems is often a key issue. This is especially true for PDAs and other embedded devices. In this years Blizzard entry CereProc R○gave itself the criteria that the full database system entered would have a smaller memory footprint than either of the two smaller database entries. This was accomplished by applying speex speech compression to the full database entry. In turn a set of small database techniques used to improve the quality of small database systems in last years entry were extended. Finally, for all systems, two quality control methods were applied to the underlying database to improve the lexicon and transcription match to the underlying data. Results suggest that mild audio quality artifacts introduced by lossy compression have almost as much impact on MOS perceived quality as concatenation errors introduced by sparse data in the smaller systems with bulked diphones. Index Terms: speech synthesis, unit selection. 1

    Stochastic Suprasegmentals: Relationships between Redundancy, Prosodic Structure and Syllabic Duration

    Get PDF
    Within spontaneous speech there are wide variations in the articulation of the same word by the same speaker. This paper explores two related factors which influence variation in articulation, prosodic structure and redundancy. We argue that the constraint of producing robust communication while efficiently expending articulatory effort leads to an inverse relationship between language redundancy and care of articulation. The inverse relationship improves robustness by spreading the information more evenly across the speech signal leading to a smoother signal redundancy profile. We argue that prosodic prominence is a linguistic means of achieving smooth signal redundancy. Prosodic prominence increases care of articulation and coincides with unpredictable sections of speech. By doing so, prosodic prominence leads to a smoother signal redundancy. Results confirm the strong relationship between prosodic prominence and care of articulation as well as an inverse relationship between language redundancy and care of articulation. In addition, when variation in prosodic boundaries is controlled for, language redundancy can predict up to 65 % of the variance in raw syllabic duration. This is comparable with 64 % predicted by prosodic prominence (accent, lexical stress and vowel type). Moreover most (62%) of this predictive power is shared. This suggests that, in English, prosodic structure is the means with which constraints caused by a robust signal requirement are expressed in spontaneous speech. 1

    Generating Narratives from Personal Digital Data: Using Sentiment, Themes, and Named Entities to Construct Stories

    Get PDF
    As the quantity and variety of personal digital data shared on social media continues to grow, how can users make sense of it? There is growing interest among HCI researchers in using narrative techniques to support interpretation and understanding. This work describes our prototype application, ReelOut, which uses narrative techniques to allow users to understand their data as more than just a database. The online service extracts data from multiple social media sources and augments it with semantic information such as sentiment, themes, and named entities. The interactive editor automatically constructs a story by using unit selection to fit data units to a simple narrative structure. It allows the user to change the story interactively by rejecting certain units or selecting a new narrative target. Finally, images from the story can be exported as a video clip or a collage

    Designing Interactions with Multilevel Auditory Displays in Mobile Audio-Augmented Reality

    Get PDF
    Auditory interfaces offer a solution to the problem of effective eyes-free mobile interactions. In this article, we investigate the use of multilevel auditory displays to enable eyes-free mobile interaction with indoor location-based information in non-guided audio-augmented environments. A top-level exocentric sonification layer advertises information in a gallery-like space. A secondary interactive layer is used to evaluate three different conditions that varied in the presentation (sequential versus simultaneous) and spatialisation (non-spatialised versus egocentric/exocentric spatialisation) of multiple auditory sources. Our findings show that (1) participants spent significantly more time interacting with spatialised displays; (2) using the same design for primary and interactive secondary display (simultaneous exocentric) showed a negative impact on the user experience, an increase in workload and substantially increased participant movement; and (3) the other spatial interactive secondary display designs (simultaneous egocentric, sequential egocentric, and sequential exocentric) showed an increase in time spent stationary but no negative impact on the user experience, suggesting a more exploratory experience. A follow-up qualitative and quantitative analysis of user behaviour support these conclusions. These results provide practical guidelines for designing effective eyes-free interactions for far richer auditory soundscapes

    My Life On Film

    Get PDF
    Social media has begun to migrate from a predominantly text-based medium, through photography and into cinematography and edited video. Film is a vital medium through which we not only capture our world, but also seek to understand it. This workshop explores an emerging area of research within the CHI community that focuses on applying filmic techniques in two different ways; 1) to automatically interpret personal data and to allow users to interact with personal data, and 2) to explore film as a vehicle for the personal curation of digital identity. This multidisciplinary, one-day workshop will bring together social scientists, cinematography experts, ethnographers, semantic and graphics engineers together with general HCI practitioners to explore and evaluate individual and community representations on film, new ways of translating traditional social media data into film, the engineering challenges of automatically rendering filmic media, and the critical role such automatic and semi-automatic systems can play in persuasion, understanding, and empowerment

    A life story in three parts: the use of triptychs to make sense of personal digital data

    Get PDF
    Many social media platforms support the curation of personal digital data, and, more recently, the use of that data for review and reflection. We explored the process of reflection by asking users to create a meaningful ‘triptych’ of photographs drawn from their Facebook accounts. In a first study, we asked participants to manually trawl their own accounts and select three relevant images, which we then framed and used as an interview probe. In a second study, we designed an automated triptych generation system and assessed participants’ experiences of using this system. We conducted qualitative analyses of participant interviews from both studies. Consistent with other ‘slow technology’ work, we found the act of creating a physical artefact from social media data gave that data new meaning, albeit with notable differences between manual vs automatically generated triptychs. We conclude by discussing possible improvements to the design of the automated triptych system

    Faulty cardiac repolarization reserve in alternating hemiplegia of childhood broadens the phenotype

    Get PDF
    Alternating hemiplegia of childhood is a rare disorder caused by de novo mutations in the ATP1A3 gene, expressed in neurons and cardiomyocytes. As affected individuals may survive into adulthood, we use the term 'alternating hemiplegia'. The disorder is characterized by early-onset, recurrent, often alternating, hemiplegic episodes; seizures and non-paroxysmal neurological features also occur. Dysautonomia may occur during hemiplegia or in isolation. Premature mortality can occur in this patient group and is not fully explained. Preventable cardiorespiratory arrest from underlying cardiac dysrhythmia may be a cause. We analysed ECG recordings of 52 patients with alternating hemiplegia from nine countries: all had whole-exome, whole-genome, or direct Sanger sequencing of ATP1A3. Data on autonomic dysfunction, cardiac symptoms, medication, and family history of cardiac disease or sudden death were collected. All had 12-lead electrocardiogram recordings available for cardiac axis, cardiac interval, repolarization pattern, and J-point analysis. Where available, historical and prolonged single-lead electrocardiogram recordings during electrocardiogram-videotelemetry were analysed. Half the cohort (26/52) had resting 12-lead electrocardiogram abnormalities: 25/26 had repolarization (T wave) abnormalities. These abnormalities were significantly more common in people with alternating hemiplegia than in an age-matched disease control group of 52 people with epilepsy. The average corrected QT interval was significantly shorter in people with alternating hemiplegia than in the disease control group. J wave or J-point changes were seen in six people with alternating hemiplegia. Over half the affected cohort (28/52) had intraventricular conduction delay, or incomplete right bundle branch block, a much higher proportion than in the normal population or disease control cohort (P = 0.0164). Abnormalities in alternating hemiplegia were more common in those ≄16 years old, compared with those <16 (P = 0.0095), even with a specific mutation (p.D801N; P = 0.045). Dynamic, beat-to-beat or electrocardiogram-to-electrocardiogram, changes were noted, suggesting the prevalence of abnormalities was underestimated. Electrocardiogram changes occurred independently of seizures or plegic episodes. Electrocardiogram abnormalities are common in alternating hemiplegia, have characteristics reflecting those of inherited cardiac channelopathies and most likely amount to impaired repolarization reserve. The dynamic electrocardiogram and neurological features point to periodic systemic decompensation in ATP1A3-expressing organs. Cardiac dysfunction may account for some of the unexplained premature mortality of alternating hemiplegia. Systematic cardiac investigation is warranted in alternating hemiplegia of childhood, as cardiac arrhythmic morbidity and mortality are potentially preventable

    Disfluency and speech recognition profile factors

    Get PDF
    This paper reports on work bringing together disfluency coding carried out by Lickley [7] and recognition work carried out as part of the ERF project (Bard, Thompson &amp; Isard, [2]) at Edinburgh University. A set of factors are investigated which characterise the behaviour of the ASR during recognition based on an analysis of the resulting word lattice. These factors can be grouped as: Entropy Factors – the entropy of the acoustic and language model likelihoods, within the word lattice, over a 10 ms frame, and, Arc Factors – the number of non-unique and unique arcs in the word lattice in any given 10ms time frame, together with the variance of start and end times of these arcs, and the number of arcs starting or ending in the frame. The values of all factors were used to train a simple CART model. The CART model was used to predict: recognition failure, interruption point location (the point where a disfluency begins), and whether the location was in a repair or a reparandum. The entropy of the language model values contributed most to the models prediction of recognition failure, and whether a frame was in a repair or reparandum. In contrast, the number of unique word hypotheses contributed most to the successful prediction of a frame being close to an interruption point. 1
    corecore